Data Science @ Personio
1 2 3 4 5 6
Note
No Personio data used in this presentation. The problem discussed motivated the research, but the models presented below will use fake or public data.
We will motivate the work with a discussion of App performance measurements and customer feedback metrics.
How do they influence one another and how to best distil the relationship between the two?
Note
Acknowledgements: The work on the BVAR discussed here took inspiration and borrowed heavily from the PYMC labs blogpost here. The example PRs benefited from the feedback of the PYMC devs.
Warning
Not an Economist!!!
1 2 3 4 5 6
“Balancing feedback loops are equlibrating or goal seeking structures in systems and are both sources of stability and resistence to change”
“Allowing performance standards to be influenced by past performance… sets up a reinforcing feedback of eroding goals that sets a system drifting toward low performance.”
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
“…this justifies linear models as approximations” - Hansen Econometrics
1 2 3 4 5 6
The coefficients in: \(Y_{t} = \mu + \sum_{j=0}^{\infty}b_{j}e_{t-j}\) are known as the impulse response function IRF representing the change in \(Y_{t + j}\) due to a shock at time \(t\).
They can be recovered from the AR representation by recursive reconstruction.
IRFs are often reported when scaled by the standard deviation of shocks.
1 2 3 4 5 6
“The Child is the father of the Man” - Wordsworth
1 2 3 4 5 6
\[ Y \sim TruncNorm(\color{purple}{\mu, \sigma, 0, \infty}) \\ \color{purple}{\mu} = \color{green}{AR_{\mu}} + f(\color{red}{seasonality}) + f(\color{CornflowerBlue}{trend}) + ... \\ \color{purple}{\sigma} \sim InverseGamma(3, .5) \\ trend = \color{CornflowerBlue}{\alpha + \beta \cdot T_{i}} \\ \color{CornflowerBlue}{\alpha} \sim Normal(7, 1) \\ \color{CornflowerBlue}{\beta} \sim Normal(0.009, .1) \\ seasonality = \color{red}{FourierFeatures^{'}\beta_{fourier}} \\ \color{red}{\beta_{fourier}} \sim Normal(0, 0.5) \\ \color{green}{AR_{\mu}} \sim \color{green}{\mu_{ar} + a_{1}\cdot Y_{t-1} + a_{2} \cdot Y_{t-2} ... + \gamma} \]
\[ p(\theta | \mathbf{Data}) \sim p(\theta)\cdot p(\mathbf{Data} | \theta) \]
1 2 3 4 5 6
1 2 3 4 5 6
with AR:
## Data containers to enable prediction
t = pm.MutableData("t", t_data, dims="obs_id")
y = pm.MutableData("y", ar_data, dims="obs_id")
# AR priors: The first coefficient will be the intercept term
coefs = pm.Normal("coefs", priors["coefs"]["mu"], priors["coefs"]["sigma"])
sigma = pm.HalfNormal("sigma", priors["sigma"])
## Priors for the linear trend component
alpha = pm.Normal("alpha", priors["alpha"]["mu"], priors["alpha"]["sigma"])
beta = pm.Normal("beta", priors["beta"]["mu"], priors["beta"]["sigma"])
## Priors for seasonality
beta_fourier = pm.Normal(
"beta_fourier",
mu=priors["beta_fourier"]["mu"],
sigma=priors["beta_fourier"]["sigma"],
dims="fourier_features",
)
# AR process: We need one init variable for each lag, hence size is variable too
init = pm.Normal.dist(
priors["init"]["mu"], priors["init"]["sigma"], size=priors["init"]["size"]
)
# Steps of the AR model minus the lags required given specification
ar1 = pm.AR(
"ar",
coefs,
sigma=sigma,
init_dist=init,
constant=True,
steps=t.shape[0] - (priors["coefs"]["size"] - 1),
dims="obs_id",
)
# Trend Component
trend = pm.Deterministic("trend", alpha + beta * t, dims="obs_id")
# Seasonality Component
fourier_terms = pm.MutableData("fourier_terms", ff)
seasonality = pm.Deterministic(
"seasonality", pm.math.dot(beta_fourier, fourier_terms), dims="obs_id"
)
## Additive Combination
mu = ar1 + trend + seasonality
# The Likelihood
outcome = pm.Normal("likelihood", mu=mu, sigma=sigma, observed=y, dims="obs_id")
## Sampling
idata_ar = pm.sample_prior_predictive()
with AR:
## Data containers to enable prediction
t = pm.MutableData("t", t_data, dims="obs_id")
y = pm.MutableData("y", ar_data, dims="obs_id")
# AR priors: The first coefficient will be the intercept term
coefs = pm.Normal("coefs", priors["coefs"]["mu"], priors["coefs"]["sigma"])
sigma = pm.HalfNormal("sigma", priors["sigma"])
## Priors for the linear trend component
alpha = pm.Normal("alpha", priors["alpha"]["mu"], priors["alpha"]["sigma"])
beta = pm.Normal("beta", priors["beta"]["mu"], priors["beta"]["sigma"])
## Priors for seasonality
beta_fourier = pm.Normal(
"beta_fourier",
mu=priors["beta_fourier"]["mu"],
sigma=priors["beta_fourier"]["sigma"],
dims="fourier_features",
)
# AR process: We need one init variable for each lag, hence size is variable too
init = pm.Normal.dist(
priors["init"]["mu"], priors["init"]["sigma"], size=priors["init"]["size"]
)
# Steps of the AR model minus the lags required given specification
ar1 = pm.AR(
"ar",
coefs,
sigma=sigma,
init_dist=init,
constant=True,
steps=t.shape[0] - (priors["coefs"]["size"] - 1),
dims="obs_id",
)
# Trend Component
trend = pm.Deterministic("trend", alpha + beta * t, dims="obs_id")
# Seasonality Component
fourier_terms = pm.MutableData("fourier_terms", ff)
seasonality = pm.Deterministic(
"seasonality", pm.math.dot(beta_fourier, fourier_terms), dims="obs_id"
)
## Additive Combination
mu = ar1 + trend + seasonality
# The Likelihood
outcome = pm.Normal("likelihood", mu=mu, sigma=sigma, observed=y, dims="obs_id")
## Sampling
idata_ar = pm.sample_prior_predictive()
with AR:
## Data containers to enable prediction
t = pm.MutableData("t", t_data, dims="obs_id")
y = pm.MutableData("y", ar_data, dims="obs_id")
# AR priors: The first coefficient will be the intercept term
coefs = pm.Normal("coefs", priors["coefs"]["mu"], priors["coefs"]["sigma"])
sigma = pm.HalfNormal("sigma", priors["sigma"])
## Priors for the linear trend component
alpha = pm.Normal("alpha", priors["alpha"]["mu"], priors["alpha"]["sigma"])
beta = pm.Normal("beta", priors["beta"]["mu"], priors["beta"]["sigma"])
## Priors for seasonality
beta_fourier = pm.Normal(
"beta_fourier",
mu=priors["beta_fourier"]["mu"],
sigma=priors["beta_fourier"]["sigma"],
dims="fourier_features",
)
# AR process: We need one init variable for each lag, hence size is variable too
init = pm.Normal.dist(
priors["init"]["mu"], priors["init"]["sigma"], size=priors["init"]["size"]
)
# Steps of the AR model minus the lags required given specification
ar1 = pm.AR(
"ar",
coefs,
sigma=sigma,
init_dist=init,
constant=True,
steps=t.shape[0] - (priors["coefs"]["size"] - 1),
dims="obs_id",
)
# Trend Component
trend = pm.Deterministic("trend", alpha + beta * t, dims="obs_id")
# Seasonality Component
fourier_terms = pm.MutableData("fourier_terms", ff)
seasonality = pm.Deterministic(
"seasonality", pm.math.dot(beta_fourier, fourier_terms), dims="obs_id"
)
## Additive Combination
mu = ar1 + trend + seasonality
# The Likelihood
outcome = pm.Normal("likelihood", mu=mu, sigma=sigma, observed=y, dims="obs_id")
## Sampling
idata_ar = pm.sample_prior_predictive()
1 2 3 4 5 6
Autoregressive BSTS prior and posterior fits. For details see PYMC docs here
1 2 3 4 5 6
Causal Impact analysis for more detail see CausalPy
1 2 3 4 5 6
\[ \mathbf{y}_{T} = \nu + A_{1}\mathbf{y}_{T-1} + A_{2}\mathbf{y}_{T-2} ... A_{p}\mathbf{y}_{T-p} + \mathbf{e}_{t} \]
Wold Decomposition: \[ \mathbf{y}_{T} = \mu + \Omega_{1}\mathbf{e}_{T-1} + \Omega_{2}\mathbf{e}_{T-2} ... \Omega_{p}\mathbf{e}_{T-p} \]
1 2 3 4 5 6
A VAR can have n lags and m equations such that the lagged terms of each series appear in all equations.
\[ gdp_{t} = \beta_{gdp1} \cdot gdp_{t-1} + \beta_{gdp2} \cdot gdp_{t-2} + \color{red}{\beta_{cons1}} \cdot cons_{t-1} + \color{red}{\beta_{cons2}} \cdot cons_{t-2} + \epsilon_{gdp}\] \[ cons_{t} = \beta_{cons1} \cdot cons_{t-1} + \beta_{cons2} \cdot cons_{t-2} + \color{red}{\beta_{gdp1}} \cdot gdp_{t-1} + \color{red}{\beta_{gdp2}} \cdot gdp_{t-2} + \epsilon_{cons}\]
As such the coefficients on the cross-equation variables are of particular interest when investigating the influence of one series on another.
The MV IRF function can be created in an analogous way to the univariate timeseries function giving an interpretation of e.g the change in \(gdp_{t}\) due to a shock in \(cons_{t-2}\).
This opens up the possibility of testing how influence between variables percolates over time.
1 2 3 4 5 6
Statsmodels implementation of IRF
1 2 3 4 5 6
### Define a helper function that will construct our autoregressive step for the marginal contribution of each lagged
### term in each of the respective time series equations
def calc_ar_step(lag_coefs, n_eqs, n_lags, df):
ars = []
for j in range(n_eqs):
ar = pm.math.sum(
[
pm.math.sum(lag_coefs[j, i] * df.values[n_lags - (i + 1) : -(i + 1)], axis=-1)
for i in range(n_lags)
],
axis=0,
)
ars.append(ar)
betaX = pm.math.stack(ars, axis=-1)
return betaX1 2 3 4 5 6
VAR(2)
1 2 3 4 5 6
with pm.Model(coords=coords) as model:
lag_coefs = pm.Normal(
"lag_coefs",
mu=priors["lag_coefs"]["mu"],
sigma=priors["lag_coefs"]["sigma"],
dims=["equations", "lags", "cross_vars"],
)
alpha = pm.Normal(
"alpha", mu=priors["alpha"]["mu"], sigma=priors["alpha"]["sigma"], dims=("equations",)
)
data_obs = pm.Data("data_obs", df.values[n_lags:], dims=["time", "equations"], mutable=True)
betaX = calc_ar_step(lag_coefs, n_eqs, n_lags, df)
betaX = pm.Deterministic("betaX", betaX, dims=["time",])
mean = alpha + betaX
if mv_norm:
n = df.shape[1]
## Under the hood the LKJ prior will retain the correlation matrix too.
noise_chol, _, _ = pm.LKJCholeskyCov(
"noise_chol",
eta=priors["noise_chol"]["eta"],
n=n,
sd_dist=pm.HalfNormal.dist(sigma=priors["noise_chol"]["sigma"]),
)
obs = pm.MvNormal(
"obs", mu=mean, chol=noise_chol, observed=data_obs, dims=["time", "equations"]
)1 2 3 4 5 6
Macroeconomic Data
1 2 3 4 5 6
| GDP | CONS | |
|---|---|---|
| GDP | 1 | 0.43 |
| CONS | 0.43 | 1 |
1 2 3 4 5 6
Compare
1 2 3 4 5 6
Sampling Chains
“Those who only know one country, know no country” – Seymour Martin Lipset
1 2 3 4 5 6
Shrinkage to Centre Effect
1 2 3 4 5 6
groups = df[group_field].unique()
with pm.Model(coords=coords) as model:
## Hierarchical Priors
rho = pm.Beta("rho", alpha=2, beta=2)
alpha_hat_location = pm.Normal("alpha_hat_location", 0, 0.1)
alpha_hat_scale = pm.InverseGamma("alpha_hat_scale", 3, 0.5)
beta_hat_location = pm.Normal("beta_hat_location", 0, 0.1)
beta_hat_scale = pm.InverseGamma("beta_hat_scale", 3, 0.5)
omega_global, _, _ = pm.LKJCholeskyCov(
"omega_global", n=n_eqs, eta=1.0, sd_dist=pm.Exponential.dist(1)
)
## Group Specific Parameter
for grp in groups:
df_grp = df[df[group_field] == grp][cols]
z_scale_beta = pm.InverseGamma(f"z_scale_beta_{grp}", 3, 0.5)
z_scale_alpha = pm.InverseGamma(f"z_scale_alpha_{grp}", 3, 0.5)
lag_coefs = pm.Normal(
f"lag_coefs_{grp}",
mu=beta_hat_location,
# Hierarchical offset
sigma=beta_hat_scale * z_scale_beta,
dims=["equations", "lags", "cross_vars"],
)
alpha = pm.Normal(
f"alpha_{grp}",
mu=alpha_hat_location,
# Hierarchical Offset
sigma=alpha_hat_scale * z_scale_alpha,
dims=("equations",),
)
betaX = calc_ar_step(lag_coefs, n_eqs, n_lags, df_grp)
betaX = pm.Deterministic(f"betaX_{grp}", betaX)
mean = alpha + betaX
n = df_grp.shape[1]
noise_chol, _, _ = pm.LKJCholeskyCov(
f"noise_chol_{grp}", eta=10, n=n, sd_dist=pm.Exponential.dist(1)
)# Hierarchical Offset
omega = pm.Deterministic(f"omega_{grp}", rho * omega_global + (1 - rho) * noise_chol)
obs = pm.MvNormal(f"obs_{grp}", mu=mean, chol=omega, observed=df_grp.values[n_lags:])1 2 3 4 5 6
Country Specific Estimates
1 2 3 4 5 6
Global Correlations
1 2 3 4 5 6
Posterior Predictive Fits by country. See PYMC docs here